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Abstract —Many functional elements of human homes and 
workplaces consist of rigid components which are connected 
through one or more sliding or rotating linkages. Examples 
include doors and drawers of cabinets and appliances; laptops; 
and swivel office chairs. A robotic mobile manipulator would 
benefit from the ability to acquire kinematic models of such 
objects from observation. This paper describes a method by 
which a robot can acquire an object model by capturing depth 
imagery of the object as a human moves it through its range of 
motion. We envision that in future, a machine newly introduced 
to an environment could be shown by its human user the 
articulated objects particular to that environment, inferring from 
these “visual demonstrations” enough information to actuate each 
object independently of the user. 

Our method employs sparse (markerless) feature tracking, 
motion segmentation, component pose estimation, and articu¬ 
lation learning; it does not require prior object models. Using 
the method, a robot can observe an object being exercised, infer 
a kinematic model incorporating rigid, prismatic and revolute 
joints, then use the model to predict the object’s motion from 
a novel vantage point. We evaluate the method’s performance, 
and compare it to that of a previously published technique, for 
a variety of household objects. 

I. Introduction 

A long-standing challenge in robotics is to endow robots 
with the ability to interact effectively with the diversity of 
objects common in human-made environments. Existing ap¬ 
proaches to manipulation often assume that objects are simple 
and drawn from a small set. The models are then either pre¬ 
defined or learned from training, for example requiring fiducial 
markers on object parts, or prior assumptions about object 
structure. Such requirements may not scale well as the number 
and variety of objects increases. This paper describes a method 
with which robots can learn kinematic models for articulated 
objects in situ, simply by observing a user manipulate the 
object. Our method learns open kinematic chains that involve 
rigid linkages, and prismatic and revolute motions, between 
parts. 

There are three primary contributions of our approach that 
make it effective for articulation learning. First, we propose 
a feature tracking algorithm designed to perceive articulated 
motions in unstructured environments, avoiding the need to 
embed fiducial markers in the scene. Second, we describe a 
motion segmentation algorithm that uses kernel-based clus¬ 
tering to group feature trajectories arising from each object 
part. A subsequent optimization step recovers the 6-DOF pose 
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Fig. 1: The proposed framework reliably learns the underlying 
kinematic model of multiple articulated objects from user- 
provided visual demonstrations, and subsequently predicts 
their motions at future encounters. 

of each object part. Third, the method enables use of the 
learned articulation model to predict the object’s motion when 
it is observed from a novel vantage point. Figure [I] illustrates 
a scenario where our method learns kinematic models for 
a refrigerator and microwave from separate user-provided 
demonstrations, then predicts the motion of each object in 
a subsequent encounter. We present experimental results that 
demonstrate the use of our method to learn kinematic models 
for a variety of everyday objects, and compare our method’s 
performance to that of the current state of the art. 

II. Related Work 

Providing robots with the ability to learn models of ar¬ 
ticulated objects requires a range of perceptual skills such 
as object tracking, motion segmentation, pose estimation, 
and model learning. It is desirable for robots to learn these 
models from demonstrations provided by ordinary users. This 
necessitates the ability to deal with unstructured environments 
and estimate object motion without requiring tracking markers. 
Traditional tracking algorithms such as KLT ia, or those 
based on SIFT (B) depend on sufficient object texture and 
may be susceptible to drift when employed over an object’s 
full range of motion. Alternatives such as large-displacement 
optical flow (4) or particle video methods (13 tend to be more 
accurate but require substantially more computation. 












Fig. 2: Articulation learning architecture. 


Articulated motion understanding generally requires a com¬ 
bination of motion tracking and segmentation. Existing motion 
segmentation algorithms use feature based trackers to construct 
spatio-temporal trajectories from sensor data, and cluster these 
trajectories based on rigid-body motion constraints. Recent 
work by Brox and Malik 0 in segmenting feature trajectories 
has shown promise in analyzing and labeling motion profiles 
of objects in video sequences in an unsupervised manner. 
Recent work by Elhamifar and Vidal [51 has proven effective 
at labeling object points based purely on motion visible in a 
sequence of standard camera images. Our framework employs 
similar techniques, and introduce a segmentation approach for 
features extracted from RGB-D data. 

Researchers have studied the problem of learning models 
from visual demonstration. Yan and Pollefeys [24] and Huang 
et al. ifTOl employ structure from motion techniques to segment 
the articulated parts of an object, then estimate the prismatic 
and rotational degrees of freedom between these parts. These 
methods are sensitive to outliers in the feature matching step, 
resulting in significant errors in pose and model estimates. 
Closely related to our work, Katz et al. ED consider the 
problem of extracting segmentation and kinematic models 
from interactive manipulation of an articulated object. They 
take a deterministic approach, first assuming that each object 
linkage is prismatic and proceed to fit a rotational degree-of- 
freedom only if the residual is above a specified threshold. 
Katz et al. learn from observations made in clean, clutter- 
free environments and primarily consider objects in close 
proximity to the RGB-D sensor. Recently, Katz et al. ED 
propose an improved learning method that has equally good 
performance with reduced algorithmic complexity. However, 
the method does not explicitly reason over the complexity of 
the inferred kinematic models, and tends to over-fit to observed 
motion. In contrast, our algorithm targets in situ learning in 
unstructured environments with probabilistic techniques that 
provide robustness to noise. Our method adopts the work of 
Sturm et al. (22), which used a probabilistic approach to reason 
over the likelihood of the observations while simultaneously 


penalizing complexity in the kinematic model. Their work 
differs from ours in two main respects: they required that 
fiducial markers be placed on each object part in order to 
provide nearly noise-free observations; and they assume that 
the number of unique object parts is known a priori. 

III. Articulation Learning From Visual 
Demonstration 

This section introduces the algorithmic components of our 
method. Figure [2] illustrates the steps involved. 

Our approach consists of a training phase and a prediction 
phase. The training phase proceeds as follows: (i) Given RGB- 
D data, a feature tracker constructs long-range feature trajec¬ 
tories in 3-D. (ii) Using a relative motion similarity metric, 
clusters of rigidly moving feature trajectories are identified, 
(iii) The 6-DOF motion of each cluster is then estimated 
using 3-D pose optimization, (iv) Given a pose estimate for 
each identified cluster, the most likely kinematic structure and 
model parameters for the articulated object are determined. 
Figure [3] illustrates the steps involved in the training phase 
with inputs and outputs for each component. 



Fig. 3: The training phase. 

Once the kinematic model of an articulated object is learned, 
our system can predict the motion trajectory of the object 
during future encounters. In the prediction phase: (i) Given 
RGB-D data, the description of the objects in the scene, 
D q Ue ry, is extracted using SURF (T) descriptors, (ii) Given 
a set of descriptors D guen/ , the best-matching object and 
its kinematic model, (ij) G G are retrieved; and 

(iii) From these correspondences and the kinematic model 


























































parameters of the matching object, the object’s articulated 
motion is predicted. Figure [4] illustrates the steps involved in 
the prediction phase. 
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Fig. 4: The prediction phase. 


A. Spatio-Temporal Feature Tracking 

The first step in articulation learning from visual demon¬ 
stration involves visually observing and tracking features on 
the object while it is being manipulated. We focus on unstruc¬ 
tured environments without fiducial markers. Our algorithm 
combines interest-point detectors and feature descriptors with 
traditional optical flow methods to construct long-range feature 
trajectories. We employ Good Features To Track (GFTT) lf20l 
to initialize up to 1500 salient features with a quality level of 
0.04 or greater, across multiple image scales. Once the features 
are detected, we populate a mask image that captures regions 
where interest points are detected at each pyramid scale. We 
use techniques from previous work on dense optical flow (7) to 
predict each feature at the next timestep. Our implementation 
also employs median filtering as suggested by Wang et al. l23l 
to reduce false positives. 

We bootstrap the detection and tracking steps with a feature 
description step that extracts and learns the description of 
the feature trajectory. At each image scale, we compute the 
SURF descriptor (T] over features that were predicted from 
the previous step, denoted as / t , and compare them with 
the description of the detected features at time t , denoted 
as / t . Subsequently, detected features f l that are sufficiently 
close to predicted features f l and that successfully meet a 
desired match score are added to the feature trajectory, while 
the rest are pruned. To combat drift, we use the detection 
mask as a guide to reinforce feature predictions with feature 
detections. Additionally, we incorporate flow failure detection 
techniques ED to reduce drift in feature trajectories. 

Like other feature-based methods fill our method requires 
visual texture. In typical video sequences, some features are 
continuously tracked, while other features are lost due to 
occlusion or lack of image saliency. To provide rich trajectory 
information, we continuously add features to the scene as 
needed. We maintain a constant number of feature trajectories 
tracked, by adding newly detected features in regions that are 
not yet occupied. From RGB-D depth information, image- 
space feature trajectories can be easily extended to 3-D. As 
a result, each feature key-point is represented by its normal¬ 
ized image coordinates ( u , v ), position p G M 3 and surface 
normal n , represented as (p, ft) G M 3 x SO (2). We denote 
F = {Fi,... ,F n } as the resulting set of feature trajectories 
constructed, where F* = {(pi, n\ (pt, n t )}. To combat 
noise inherent in our consumer-grade RGB-D sensor, we post¬ 
process the point cloud with a fast bilateral filter CE3 with 
parameters a s =20 px, a r = 4 cm. 


B. Motion Segmentation 

To identify the kinematic relationships among parts in an 
articulated object, we first distinguish the trajectory taken 
by each part. In particular, we analyze the motions of the 
object parts with respect to each other over time, and infer 
whether or not pairs of object parts are rigidly attached. To 
reason over candidate segmentations, we formulate a clustering 
problem to identify the different motion subspaces in which 
the object parts lie. After clustering, similar labels imply rigid 
attachment, while dissimilar labels indicate non-rigid relative 
motion between parts. 

If two features in M 3 x SO(2) belong to the same rigid part, 
the relative displacement and angle between the features will 
be consistent over the common span of their trajectories. The 
distribution over the relative change in displacement vectors 
and angle subtended is modeled as a zero-mean Gaussian, 
A/"(/i, E) = (0,E), where E is the expected noise covariance 
for rigidly-connected feature pairs. The similarity of two 
feature trajectories can then be defined as: 

L(i,j) = { SI ex P < -7 > (1) 

1 teuntj l ) 

where t{ and t 3 are the observed time instances of the feature 
trajectories i, and j respectively, T = \Ur\tj\, and 7 is 
a parameter characterizing the relative motion of the two 
trajectories. For a pair of 3-D key-point features pi, and pj, 
we estimate the mean relative displacement between a pair of 
points moving rigidly together as: 

= 3 S2 d FF) ( 2 ) 

tEtidtj 

where d(pl,pj) = \\pl — pA I. For 3-D key-points, we use 
7 = As in Ec i n - 0 Figure [5] illustrates an example of rigid 
and non-rigid motions of feature trajectory pairs, and their 
corresponding distribution of relative displacements. 

For a pair of surface normals n t and rij , we define the mean 
distance as 

Tdij = ^ 'y ^ d(rii , rij ), (3) 

tEtidtj 

where d(rTi,rTj) = 1 — ni • nj. In this case, we use 

7= in Ec i n -0 

Since the bandwidtn parameter 7 for a pair of feature trajec¬ 
tories can be intuitively predicted from the expected variance 
in relative motions of trajectories, we employ DBS CAN ©, 
a density-based clustering algorithm, to find rigidly associated 
feature trajectories. The resulting cluster assignments are de¬ 
noted as C = {Ci,..., Cfc}, where cluster C{ consists of a 
set of rigidly-moving feature trajectories. 

C. Multi-Rigid-Body Pose Optimization 

Given the cluster label assignment for each feature trajec¬ 
tory, we subsequently determine the 6-DOF motion of each 
cluster. We define Z\ as the set of features belonging to cluster 
Ci at time t. Additionally, we define X =* Xi, ..., Xk as the 
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Fig. 5: Histogram of observed distances between a pair of 
trajectories accumulated over one demonstration. (Left) The 
distribution of observed distances is centered at fi = 0.029 m 
with a = 0.001 m, indicating rigid-body motion. (Right) 
Larger variation in observed distances, with a = 0.018 m, 
indicates non-rigid motion. 

set of SE(3) poses estimated for each of k clusters considered, 
and x\ G Xi as the SE( 3) pose estimated for the i th cluster 
at time t. 

For each cluster C t , we consider the synchronized sensor 
observations of position and surface normals for each of its 
trajectories, and use the arbitrary pose x\ as the reference 
frame for the remaining pose estimates of the i th cluster. 
Subsequently, we compute the relative transformation 
between successive time steps t — 1 and t for the i th cluster 
using the known correspondences between and Z\. 

Since this step can lead to drift, we add an additional sparse 
set of relative pose constraints every 10 frames, denoted 
as A* _10,t . Our implementation employs a correspondence 
rejection step that eliminates outliers falling outside the inlier 
distance threshold of 1 cm, as in RANSAC ra, making the 
pose estimation routine more robust to sensor noise. 

We augment the estimation step with an optimization phase 
to provide smooth and continuous pose estimates for each 
cluster by incorporating a motion model. We use the 3-D 
pose optimizer iSAM GD to incorporate the relative pose 
constraints within a factor graph, with node factors derived 
directly from the pose estimates. A constant-velocity edge fac¬ 
tor term is also added to provide continuity in the articulated 
motion. 

D. Articulation Learning 

Once the 6-DOF pose estimates of the individual object 
parts are computed, the kinematic model of the full articulated 
object is determined using tools developed in Sturm et al. El. 
Given multiple 6-DOF pose observations of object parts, the 
problem is to estimate the most likely kinematic configuration 
for the articulated object. Formally, given the observed poses 
V z , we estimate the kinematic graph configuration G that 
maximizes the posterior probability 

G = argmax p(G \ V z ) (4) 

G 

We employ notation similar to that of Sturm et al. (22) to 
denote the relative transformation between two object parts i 
and j as A^ = Xi © Xj , using standard motion composition 
operator notation ED The kinematic model between part i 
and j is then defined as M^ , with its associated parameter 
vector Oij G W ij , where pij are the number of parameters 


associated with the description of the link. We construct a 
graph G = ( Vg,Eg ) consisting of a set of vertices Vq = 
1 ,..., k that denote the object parts involved in the articulated 
object, and a set of undirected edges Eg C V g xVg describing 
the kinematic linkage between two object parts. 

As in Sturm et al. (22) . we simplify the problem to recognize 
only kinematic trees of high posterior probability, in order to 
reformulate the problem as equation [8] below: 


G = argmax p(G \ V z ) (5) 

G 

= argmax p({(A^-,%) | (ij) G E G } \ V z ) (6) 
G 

= argmax JJ p(M ijl 6 ij \V Z ) (7) 

G ( ij)eE G 

= argmax ^ log p(M ij ,0 ij | V z ) (8) 

(ij)eE G 


where V z = (Aj-,..., A* •) V (ij) G Eg is the sequence of 
observed relative transformations between parts i and j. 

Since we are particularly interested in household objects, 
we focus on kinematic models involving rigid, prismatic, and 
revolute linkages. We then estimate the parameters 0 Gl p that 
maximize the data likelihood of the object pose observations 
given the kinematic model: 

6 = arg max p(V z \ M, 6) (9) 

e 

Once we fit each candidate kinematic model to the given 
observation sequence, we select the kinematic model that 
best explains the data. Specifically, we compute the posterior 
probability of each kinematic model, given the data, as: 


P(M | V z ) = 


p(V z \M,0) p(0\M) p(M) 
p(V z ) 


do 


( 10 ) 


Due to the evaluation complexity of this posterior term, the 
BIC score is computed instead as the approximation: 

BIC(M) = -21og p(V z | M,0) +plogn, (11) 


where p is the number of parameters involved in the kinematic 
model, n is the number of observations in the data set, and 0 
is the maximum likelihood parameter vector. This implies that 
the model that best explains the observations would correspond 
to that with the least BIC score. 

The kinematic structure selection problem is subsequently 
reduced to computing the minimum spanning tree of the graph 
with edges defined by costij = —log p(Mij 1 0ij \ B> Zij ). 
The resulting minimum spanning kinematic tree weighted 
by BIC scores is the most likely kinematic model for the 
articulated object given the pose observations. For a more 
detailed description, we refer the reader to Sturm et al. 
(22). Figure [6] shows a few examples of kinematic structures 
extracted given pose estimates as described in the previous 
section. Our limitation of linkage types to rigid, prismatic, 
and rotational does exclude various household objects such as 
lamps, garage doors, toys etc. with more complex kinematics. 












(a) Rotational DOF of a laptop (b) Prismatic DOF of a drawer 


Fig. 6: Examples of correctly estimated kinematic structure 
from 6-DOF pose estimates of feature trajectories. 

E. Learning to Predict Articulated Motion 

Our daily environment is filled with articulated objects with 
which we repeatedly interact. A robot in our environment can 
identify instances of articulated objects that it has observed in 
the past, then use a learned model to predict the motion of an 
object when it is used. 



(a) Extracted MSER (b) Estimated Motion Manifold 

Fig. 7: The motion manifold of an articulated object, extracted 
via MSERs. 


Once the kinematic model of an articulated object is 
learned, the kinematic structure G and its model parameters 
) E G are stored in a database, along with its 
appearance model. The feature descriptors extracted (described 
in Section |III-A| ) for each cluster C x of the articulated object 
are also retained for object recognition in future encounters. 
Demonstrations involving the same instance of the articu¬ 
lated object are represented in a single arbitrarily selected 
reference frame, and kept consistent across encounters by 
registering newer demonstrations into the initial object frame. 
Each of these attributes is stored in the bag-of-words driven 
database (§1 for convenient querying in the future. Thus, on 
encountering the same object instance in the future, the robot 
can match the descriptors extracted from the current scene 
with those extracted from object instances it learned in the 
past. It then recovers the original demonstration reference 
frame along with the relevant kinematic structure of the 
articulated object for prediction purposes. We identify the 


surface of the manipulated object by extracting Maximally 
Stable Extremal Regions (MSER) lfl6l (Figure [7]) for each 
object part undergoing motion. We use this surface to visualize 
the motion manifold of the articulated object. 


IV. Experiments and Analysis 


Our experimental setup consists of a single sensor providing 
RGB-D depth imagery. Each visual demonstration involved a 
human manipulating an articulated object and its parts at a 
normal pace, while avoiding obscuration of the object from the 
robot’s perspective. Demonstrations were performed for mul¬ 
tiple robot viewpoints, to capture variability in depth imagery. 
We performed 43 demonstration sessions by manipulating a 
variety of household objects: refrigerators, doors, drawers, 
laptops, chair etc. Each demonstration was recorded for about 
30-60 seconds. April tags fTT) were used to recover ground 
truth estimates of each articulated object’s motion, which we 
adopted as a baseline for evaluation. In order to avoid any 
influence on our method of observations arising from fiducial 
markers, the RGB-D input was pre-processed to mask out 
regions containing the tags. 

We then compared the pose estimation, model selection 
and estimation performance of our method to that of an al¬ 
ternative state-of-the-art method (re-implemented by us based 
on G31), and to traditional methods using fiducial markers. 
We incorporated several improvements fl2l . fT8l to Katz’s 
algorithm, as previously described in Section |III-A to enable 
fair comparison with our proposed method. 


A. Qualitative and Overall Performance 

Figure [8] shows the method in operation for household 
objects including a laptop, a microwave, a refrigerator and a 
drawer. Tables [I] and [II] compare the performance of our method 
in estimating the kinematic model parameters for several 
articulated objects observed from a variety of viewpoints. Our 
method recovered a correct model for more objects, and for 
almost every object tested recovered model parameters more 
accurately, than Katz’s method. 


B. Pose Estimation Accuracy 

For each visual demonstration, we compared the segmen¬ 
tation and SE( 3) pose of each object part estimated by our 
method with those produced by Katz. We also obtained pose 
estimates for each object part by tracking attached fiducial 
markers. Synchronization across pose observations was en¬ 
sured by evaluating only poses in the set intersection of the 
timestamps of each pose sequence. For each overlapping time 
step, we compared the relative pose of the estimated object 
segment obtained from both algorithms with that obtained 
via fiducial markers (Figure [9]). For consistency in evaluation, 
the SE( 3) poses of individual object parts were initialized 
identically for both algorithms. 

Figure [lO] compares the absolute SE( 3) poses estimated 
by the three methods described above, given observations of a 
chair being moved on the ground plane. Figure [T0(a)| illustrates 
a scenario in which both algorithms, ours and Katz’s, perform 
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Fig. 8: Articulation learning and motion prediction for various objects. 


reliably. Katz’s method is within 2.0 cm and 2.6 °, on average, 
of the ground truth pose produced with fiducial markers. Our 
method achieves comparable average accuracy of 1.7 cm and 
2.1°. Using data from another demonstration, Katz’s method 
failed to track the object motion robustly, resulting in drift and 
incorrect motion estimates (Figure 10(b) }. Such failures can be 
attributed to: (i) the KLT tracker that is known to cause drift 
during feature tracking; (ii) SVD least squares minimization in 
the relative pose estimation stage, without appropriate outlier 
rejection. 


For a variety of articulated objects (Table [I]), our method 
achieves average accuracies of 2.4 cm and 4.7° with respect 
to ground truth estimated from noisy Kinect RGB-D data. In 
comparison, Katz’s method tT4l achieved average accuracies 
of 3.7 cm and 10.1° for the same objects. Our method 
achieved an average error of less than 10 cm and 25 ° in 37 
of 43 demonstrations, vs. 23 of 43 for Katz. 
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Fig. 9: Pose estimation accuracy of our method, compared to 
that achieved using fiducial markers. 


C. Model Estimation Accuracy 

Once the SE( 3) poses of the object parts are estimated, 
we compare the kinematic structure and model parameters 
of the articulated object estimated by our method with those 
produced by Katz. As in our other experiments, we use the 
kinematic structure and model parameters identified from fidu¬ 
cial marker-based solutions as a baseline. Table [II] summarizes 
the model estimation and parameter estimation performance 
achieved with our method and Katz’s. The model fit error is 
defined as the average spatial and orientation error between the 
SE(3) observations and the estimated articulation manifold 
(i.e. prismatic or rotational manifold). For the dataset of 
articulated objects evaluated (Table [II]), our method achieved 
an average model fit error of 1.7 cm spatially, and 5.0° 
in orientation, an improvement over Katz’s method (average 
model fit errors of 2.0 cm and 5.8° respectively). Of 43 
demonstrations evaluated, our method determined the correct 
kinematic structure and accurate parameters in 30 cases, 
whereas Katz did so in only 15 cases. 

We also compared the model parameters estimated by our 
method and Katz’s method with ground truth from markers, by 
transforming poses estimated by both methods into the fiducial 
marker’s reference frame based on the initial configuration 
of the articulated object. This allows us to directly compare 
model parameters estimated through our proposed framework, 
the current state-of-the-art and marker-based solutions. For 
multi-DOF objects, the model parameter error averaged across 
each corresponding object part is reported. In each demonstra¬ 
tion, the model parameters estimated via our method are closer 
to the marker-based solution than those obtained by Katz. 















(a) Accurate estimation by current state-of-the-art and our framework 


(b) Failed estimation by current state-of-the-art 


Fig. 10: Comparison of SE( 3) pose for a chair estimated via fiducial markers (Tag), current state-of-the-art (Katz) and our 
framework (Ours), (a) The figures show the strong performance of our framework, as compared to marker-based solutions and 
current state-of-the-art algorithms, to robustly track and estimate the SE( 3) pose of a chair being manipulated on multiple 
occasions, (b) Current state-of-the-art, however, fails to robustly estimate the SE( 3) pose on certain trials. 
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TABLE I: Comparison of SE( 3) pose estimates 
between our framework and current state-of-the-art 
(Katz) with marker-based pose estimates considered 
as ground truth. 


TABLE II: Comparison of kinematic model estimation and parameter 
estimation capability between our framework and current state-of-the- 
art (Katz) with marker-based model estimation considered as ground 
truth. 


V. Conclusion 

We introduced a framework that enables robots to learn 
kinematic models for everyday objects from RGB-D data 
acquired during user-provided demonstrations. We combined 
sparse feature tracking, motion segmentation, object pose 
estimation and articulation learning to learn the underlying 
kinematic structure of the observed object. We demonstrated 
the qualitative and quantitative performance of our method; it 
recovers the correct structure more often, and more accurately, 
than its predecessor in the literature, and achieves accuracy 
similar to that of a marker-based solution. Our framework also 
enables the robot to predict the motion of articulated objects 
it has previously learned. Even given our method’s limitation 
to recovering open kinematic chains involving only rigid, 
prismatic or revolute linkages, its prediction capability may 
be useful in future robotic encounters requiring manipulation. 
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